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ABSTRACT 

With faster connection speed, Internet users are now mak¬ 
ing social network a huge reservoir of texts, images and 
video clips (GIF). Sentiment analysis for such online plat¬ 
form can be used to predict political elections, evaluates 
economic indicators and so on. However, GIF sentiment 
analysis is quite challenging, not only because it hinges on 
spatio-temporal visual content abstract ion, but also for the 
relationship between such abstraction and final sentiment re¬ 
mains unknown.In this paper, we dedicated to find outsuch 
relationship.We proposed a SentiPairSequence basedspatio- 
temporal visual sentiment ontology, which forms the mid¬ 
level representations for GIFsentiment. The establishment 
process of SentiPair contains two steps. First, we construct 
the Synset Forest to define the semantic tree structure of 
visual sentiment label elements. Then, through theSynset 
Forest, we organically select and combine sentiment label 
elements to form a mid-level visual sentiment representa¬ 
tion. Our experiments indicate that SentiPair outperforms 
other competing mid-level attributes. Using SentiPair, our 
analysis frameworkcan achieve satisfying prediction accu¬ 
racy (72.6%). We also opened ourdataset (GSO-2015) to the 
research community. GSO-2015 contains more than 6,000 
manually annotated GIFs out of more than 40,000 candi¬ 
dates. Each is labeled with both sentiment and SentiPair 
Sequence. 

Categories and Subject Descriptors 

H. 3.3 [Information Storage and Retrieval]: Information 
Retrieval and Indexing 
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I. INTRODUCTION 

Nowadays, Internet users are making every social net¬ 
work (such as Facehook, Twitter, Weibo, et al.) a huge 
reservoir of text, images and video clips (GIF). Men and 
women are sharing their lives and opinions at online plat¬ 
forms, which gives a new direction for applications such 
as political election prediction, economic indicator measure¬ 
ment, policy feedback evaluation and so on. 

With faster Internet connection, people are more willing 
to post GIF videos other than static images. According to a 
recent study [1] the total proportion of visual content from all 
shared links on Twitter is 36%. However, Industries deviate 



Figure 1: Sentiment analysis behinds political elec¬ 
tion 

from user habit by taking social network sentiment analysis 
as social network text analysis. Research for GIF sentiment 
analysis is still in its infancy. This is partly because the in¬ 
herent features of online GIF videos. GIF videos spreading 
in online social networks share some common features. 

Noise: Junk information can interfere the judgment of 
GIF sentiment. We studied the composition of GIF videos. 
Among 6,000 GIF videos, we found around 71.55% of them 
are mixed with other forms of information. The most com¬ 
mon (around 34.49%) noise is explanative texts. They can 
reverse the GIF sentiment. Other noises include motion 
blur, unexpected illumination change and so on. 

Irregular Length:Online GIF videoqfs length is irregu¬ 
lar. The average length of 40,000 GIF videos we collected 
is 17.82s, with the longest GIF video (3qfl3qfqf22) and the 
shortest GIF video (0qf0qfqf3a/).The irregular length makes 
it difhcult to analyze. 

Symbolism: Symbolism refers to the fact that final sen¬ 
timent judgment doesn’t rely on object appeared inthe GIF, 
but the hidden message that object symbolized. For exam¬ 
ple, Sylvester Stallone symbolizes brave, perhaps in a video 
Stallone shows up, but it is brave that influence the final 
sentiment judgment. 

Abstraction: GIF sentiment hinges on high-level ab¬ 
stractions. Take the GIF video showed in Figure 4 as an 
example, GIF videoqfs sentiment changes with the varia¬ 
tion of the girlqfs facial expression. In general, to utilize the 
abstractions, there are several problems to solve. 

1. How to describe abstractions: A mid-level representa¬ 
tion system has to be designed. 

2. How to detect abstractions: Given that representation 




GIF Sentiment Ontology (GSO) Framework 


G 

s 

o 

GIF 

Senti-Pair 

Ontology 

• Frame Sheer 

• ANP/VNP Extractor 

• Synset Forest 

• Key Frame Extractor 

• SentiPair Synthesizer 

•Senti-Score DB 





GIF Image 1 

Key Frame 

GSO Ontology 

Senti-Pair 

Sentiment 

rj 



1 

■1 

© 


Figure 2: The architecture of GSO framework 



system, we should be able to detect them. 

3. How to get sentiment from abstractions: The relation¬ 
ship between abstractions and the final sentiment judg¬ 
ment should be modeled. 

In a word, the problem of GIF sentiment analysis is com¬ 
plex and challenging. But we consider it as a process of two 
steps. The first step is to extract mid-level representations 
for those abstractions {Smile Face, Body Movement, et al. 
We call them SentiPairs). Then we get SentiPair Sequence 
by combing SentiPairs in the order of their occurrence. In 
the second step, using the model we built, a sentiment judg¬ 
ment is made based on the SentiPair Sequence. Restricted 
to the accuracy of object detect ion,object tracking, et al. It 
is not easy to track the abstractions simultaneously, but we 
looked from another angle. 

By generating a collection of mid-level representations, 
modeling the relationship between SentiPair and sentiment, 
and opening the first procedure to other fast-developing re¬ 
search helds (video classihcation, et ah), we can get GIF 
videoqfs sentiment.To that end, we address in this work two 
major challenges: 

1. To hnd out a common and effective representation for 
GIF videos. 

2. Model the relationship between representations and 
the final sentiment judgment. 

In particular, we make the following contributions: 

a We proposed a new mid-level representation: SentiPair 
Sequence. SentiPair Sequenceis based on semantic 
tree structure of visual sentiment label elementsqlthe 
Synset Forest. With SentiPair Sequence and Synset 
Forest, we built the GIF Sentiment Ontology (GSO). 
Our experiments indicate that SentiPairoutperforms 
other mid-level attributes and our framework can achieve 
satisfying detect accuracy. 

b We built a large manually labeled GIF sentiment dataset 
(GSO-2015 dataset) from 40,000+ GIF videos spread 
around social networks. This dataset contains not only 
GIF pictureqfs sentiment but also the SentiPair Se¬ 
quence. GSO-2015 is free and will be open tothe public 
to promote further research on visual sentiment. 


Figure 3: Working flow using GSO framework 


2. RELATED WORK 

Different researchers have developed different systems. Re¬ 
cent studies lay their emphasis on subject-free image sen¬ 
timent analysis, subject-specified video sentiment analysis 
and subject-free video sentiment analysis. 

For subject-free image sentiment analysis, [2] used the 
Progressive GNN network and bypassed the mid-level fea¬ 
tures. However, the number of neurons and connections is 
huge due to the abstract nature of visual sentiment. Deep 
network needs vast amount of less noisy labeled training 
instances to adjust the equal vast amount of neurons. Oth¬ 
erwise, it will get stuck into local optimum. Both [3] and [T] 
proposed to employ mid-level entities or attributes as fea¬ 
tures for subject-free image sentiment analysis.In [3], 1200 
adjective noun pairs (ANP), which may correspond to differ¬ 
ent levels of different emotions, are extracted. These ANPs 
are used as queries to crawl images from Flickr. Next, pixel- 
level features of images in each ANP are employed to train 
1200 ANP detectors. The responses of these 1200 classifiers 
can then be considered as mid-level features for visual sen¬ 
timent analysis. The work in [1] employed a similar mech¬ 
anism. The main difference is that 102 scene attributes are 
used instead. 

For subject-specihed video sentiment analysis, |3] pro¬ 
posed a framework utilizes video sound and facial expression 
to analyze interview clips. They focused on the sentiment 
analysis towards video with fixed content, similar pattern 
and average noise. The experiment result is promising, but 
due to the subject is specified, the method canqft be used 
to deal with large-scale social networkqfs GIF videos. 

For subject-free video sentiment analysis, [5] proposed to 
use the features such as color histogram to train a framework 
for online GIF sentiment analysis. They did proposed good 
GIF emotion dataset. But since the problem of subject-free 
video sentiment analysis hinges on some unsolved problems, 
namely object classification, facial expression recognition, 
et al; we can’t rely on a unihed framework for such an Ab¬ 
stract problem. 
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Figure 4: SentiPair Sequence 

3. GSO FRAMEWORK 

We dedicated to find out the essential relationship be¬ 
tween abstractions and sentiment for GIF videos.To achieve 
that goal, we developed an auto-generated ontology based on 
vast amount of user-generated content. We also developed 
GSO Framework to implement the ontology. The architec¬ 
ture of the GSO Framework we employ is shown in Figure 
2. The details of the proposed framework will be described 
in the following sections. 
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3.1 SentiPair Sequence 

When considering the problem of GIF sentiment analysis. 
We should hrstly hgure out a way to represent a GIF video. 
A good representation should follow several criterions: 

1. Descriptive: should be able to describe the abstrac¬ 
tions. 

2. Detective: should be able to be detected easily. 

3. Easy: should be able to be understood. 

4. Flexible: should be able to be extended. 

To resolve these criterions, we introduced the SentiPair 
Sequence. A SentiPair Sequence is a sequence of SentiPairs, 
while the SentiPair is the joint name of Adjective Noun Pair 
(ANP) and Verb Noun Pair (VNP).Each SentiPair refers to 
either a concrete concept like smile face or a specihc mo¬ 
tion like falling cup. In a SentiPair Sequence, SentiPairs are 
sorted by the order of their occurrence. 

Figure 4 shows a typical SentiPair Sequence.As we can 
see, the girl in the video acts differently. At the very hrst. 
The girl is smiling and hence the hrst SentiPair indicates 
Lovely Girl, In the next frame, the girl looked a bit worried, 
and the second SentiPair is Innocent Girl, With the third 
SentiPair indicates Girl Frown, we can hnd out that the girl 
looks sad, which contains a negative sentiment tendency. In 
the last frame, the girl failed to suppress her feeling and the 
SentiPair indicates Girl Shout As a result, we can conclude 
the SentiPair Sequence of this GIF video as: Lovely Girl, 
Innocent Girl, Girl Frown, and Girl Shout 

SentiPair Sequence describes the concepts associated with 
sentiment judgment. In general, SentiPair Sequence carries 
two kinds of concepts. The hrst one is the existed object (by 


Figure 5: An overlook of the Synset Forest 

ANP), and the second one is objectqfs movement (by VNP). 

Adjective Noun Pairs(ANP) was hrstly introduced by 
[3], compared to nouns or adjectives only, an ANP can turn 
a neutral noun like dog into an ANP with strong sentiment 
like cute dog by adding an adjective with a strong senti¬ 
ment. And the combined phrases also make the concepts 
more detectable than adjectives (like beautiful), which are 
too abstract. [3] 

Inspired by previous work [3], the combined phrase of 
ANP has a better detect accuracy than single word, eas¬ 
ier to be interpreted, and the whole structure is portable 
and hexible because we can add/remove ANPs dynamically. 

However, our experiment shows that ANP itself alone is 
not enough in the held of video sentiment analysis. We in¬ 
troduced Verb Noun Pair (VNP). VNP shares the same 
structure of ANP, accordingly, shares the features of portable, 
easy to be detected and interpreted. 

3.2 Synset Forest 

SentiPairs are built on three kinds of words: adjectives, 
verbs and nouns. In order to build ANP/VNP, we should 
hrst build the collection of words to choose from. We con¬ 
cluded some criterions for a good word collection. 

1. Coverage, a good word collection should cover as 
much domains as possible in order to convey the in¬ 
formation 

2. Discrepancy, words of similar meanings should ap- 















Figure 6: The GSO-2015 Dataset 


Table 1: Accuracy without attribute selection 


Algorithm 

Prec. 

Recall 

FScore 

Ace. 

Navie Bayes 

0.697 

0.704 

0.686 

0.706 

SMO 

0.791 

0.727 

0.719 

0.726 

Logistic 

0.705 

0.704 

0.704 

0.703 

AdaBoost 

0.668 

0.629 

0.528 

0.629 

Rand. Forest 

0.713 

0.715 

0.711 

0.714 


Table 2: Accuracy with Correlation Based Subset 

Algorithm 

Prec. 

Recall 

FScore 

Ace. 

Navie Bayes 

0.708 

0.704 

0.675 

0.704 

SMO 

0.722 

0.72 

0.697 

0.719 

Logistic 

0.731 

0.727 

0.703 

0.726 

AdaBoost 

0.668 

0.629 

0.528 

0.629 

Rand. Forest 

0.729 

0.724 

0.701 

0.724 


pear only once to prevent ambiguous ANP/VNP. 

3. Sentiment relation, all the words in the collection 
should have a clear sentiment relation. 

We introduced Synset Forest to resolve these three cri- 
terions. The Synset Forest is a forest consists of three trees, 
namely adjective tree, verb tree and the noun tree. An over¬ 
look of all three trees can be found at Figure 5. 

In the Wordnet[6], Synsets are interlinked by means of 
conceptual-semantic and lexical relations.By proposing the 
Synset Forest, we modeled a unihed semantic and concept 
architecture. The Synset Forest acts as a collection of candi¬ 
date words for Adjective Noun Pairs and Verb Noun Pairs. 
Since each node comes with a sentiment score, the weight 
for each ANP/VNP is decided at the first place. 


3.3 GSO-2015 Dataset 

We built a new GIF video dataset from one of the most 
popular micro-blog provider. All the GIF videos were posted 
by online users and were collected automatically. We built 
40,000+ distinct candidates. These candidates were then 
manually labeled in the fashion of GIF Sentiment Ontology. 
This work is possible owing to the crowd intelligence. We 
recruited 7 workers. Each worker was shown one GIF video 
and was expected to accomplish two tasks. Task 1 is to 
depict the given GIF using SentiPair Sequence. To be more 
specihc, for each GIF, SentiPairs were chosen by the worker. 
And each SentiPair consists either of an adjective and a 
noun(ANP) or a verb and a noun (VNP). Figure 4 illustrates 
the flow of SentiPairs and the corresponding GIF. In Task 2 
workers were expected to give the image an overall sentiment 
judgment(Positive/Negative/Neutral/Gan’t Judge). 

4. EXPERIMENTS 

We designed two experiments. The hrst experiment is de¬ 
signed to evaluate the performance of SentiPair Sequences 
we proposed. Different models applied and the evaluation 
metric is the accuracy. Moreover, we explored the possibil¬ 
ity to simplify the problem by introducing feature selection. 


Table 3: Accuracy with Correlation Based Subset 


Type 

SMO 

Logistic 

Rand. Forest 

ANP only 

0.702 

0.696 

0.696 

VNP only 

0.658 

0.662 

0.655 

SentiPair 

0.726 

0.703 

0.714 


The second experiment is designed to compare the perfor¬ 
mance of our framework to the state-of-the-art representa¬ 
tion VSO [3]. The evaluation metric is the accuracy as well. 











We choose to use the GS0-2015 dataset to train the senti¬ 
ment classifiers. One of the advantages of GSO is its ability 
to convey temporal information (through SentiPair Gombi- 
nation). The training set consists of 1124 positive instances 
(60.3%), 146 negative instances (7.8%) and 599 neutral ones 
(32.1%). 

4.1 Baseline 

We compare the performance of GIF Sentiment Ontol- 
ogy(GSO) on GSO-2015 dataset with the state-of-the-art 
image sentiment analysis framework the VSO[3]. 

4.2 GIF Sentiment Analysis With Bag of Word 

We decided to explore a simple way to get sentiment 
from SentiPairs. A straightforward way is BoW. Table 1 
shows the performance of BoW using different classifiers. We 
find the Logistic Regression and SMO are the best choices 
(the same as [3]). On the other hand, each algorithm is able 
to correctly classify a large proportion of the testing in- 
stances(around 70%). In other words, classiher is not that 
important if we have SentiPair Sequences. 

We also explored the possibility to simplify the problem 
by introducing feature selection. We used Gorrelation Based 
Subset (implemented by WEKA[7]) and the results are shown 
in Table 2. We find that feature selection can provide similar 
detect accuracy while using less mid-level attributes. 

4.3 SentiPair V.S. ANP 

To evaluate the mid-level representation we proposed, we 
compare the performance of SentiPair Sequence with both 
single ANP and single VNP. The numbers in Table 3 is the 
accuracy indicating the instances been correctly classified. 
The result indicated that SentiPair Sequence outperformed 
ANP. We believe this is because GIF sentiment is affected 
by motion information, which is expressed by VNP. 

5. CONCLUSION 

In this paper, we find out the essential relationship be¬ 
tween abstractions and sentiment for GIF videos. As a mid¬ 
level representation for GIF videos, SentiPair Sequence was 
proposed. SentiPair is built on Synset Forest, while the For¬ 
est consists of 3 trees of Synset. Each Synset is a word with 
a specific sense. Our experiments suggest that SentiPair 
Sequence can abstract videoqfs spatio-temporal information 
and it outperforms competing representations such as ANP 
or VNP. The ontology is automatically established and the 
experiments suggest that the prediction accuracy is 72.6%. 


We leave the first process (such as object detection) to the 
relevant researchers because we believe this process can be 
done simultaneously. Every single step made in the first pro¬ 
cess will improve the overall performance. We also opened 
our dataset GSO-20I5 to the public. GSO-20I5 contains 
more than 6,000 manually annotated GIE videos selected 
from more than 40,000 candidates. Each video is labeled 
with both sentiment and SentiPair Sequence. We believe it 
will be helpful for further researchers. 
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